Significant progress has been made in learning image classification neural networks under long-tail data distribution using robust training algorithms such as data re-sampling, re-weighting, and margin adjustment. Those methods, however, ignore the impact of data imbalance on feature normalization. The dominance of majority classes (head classes) in estimating statistics and affine parameters causes internal covariate shifts within less-frequent categories to be overlooked. To alleviate this challenge, we propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. In addition, a moving average-based expectation maximization (EM) algorithm is employed to estimate the statistical parameters of multiple Gaussian distributions. However, the EM algorithm is sensitive to initialization and can easily become stuck in local minima where the multiple Gaussian components continue to focus on majority classes. To tackle this issue, we developed a dual-path learning framework that employs class-aware split feature normalization to diversify the estimated Gaussian distributions, allowing the Gaussian components to fit with training samples of less-frequent classes more comprehensively. Extensive experiments on commonly used datasets demonstrated that the proposed method outperforms existing methods on long-tailed image classification.
translated by 谷歌翻译
应付嘈杂标签的大多数现有方法通常假定类别分布良好,因此无法应对训练样本不平衡分布的实际情况的能力不足。为此,本文尽早努力通过长尾分配和标签噪声来解决图像分类任务。在这种情况下,现有的噪声学习方法无法正常工作,因为将噪声样本与干净的尾巴类别的样本区分开来是具有挑战性的。为了解决这个问题,我们提出了一个新的学习范式,基于对弱数据和强数据扩展的推论,以筛选嘈杂的样本,并引入休假散布的正则化,以消除公认的嘈杂样本的效果。此外,我们基于在线先验分布中纳入了一种新颖的预测惩罚,以避免对头等阶层的偏见。与现有的长尾分类方法相比,这种机制在实时捕获班级拟合度方面具有优越性。详尽的实验表明,所提出的方法优于解决噪声标签下长尾分类中分布不平衡问题的最先进算法。
translated by 谷歌翻译
尽管深入学习算法已被深入开发用于计算机辅助结核病诊断(CTD),但它们主要依赖于精心注释的数据集,从而导致了大量时间和资源消耗。弱监督的学习(WSL)利用粗粒标签来完成精细的任务,具有解决此问题的潜力。在本文中,我们首先提出了一个新的大规模结核病(TB)胸部X射线数据集,即结核病胸部X射线属性数据集(TBX-ATT),然后建立一个属性辅助的弱点监督的框架来分类并通过利用属性信息来克服WSL方案中的监督不足来定位结核病。具体而言,首先,TBX-ATT数据集包含2000个X射线图像,其中具有七种用于TB关系推理的属性,这些属性由经验丰富的放射科医生注释。它还包括带有11200 X射线图像的公共TBX11K数据集,以促进弱监督检测。其次,我们利用一个多尺度特征交互模型,用于TB区域分类和属性关系推理检测。在TBX-ATT数据集上评估了所提出的模型,并将作为未来研究的稳固基准。代码和数据将在https://github.com/gangmingzhao/tb-attribute-weak-localization上获得。
translated by 谷歌翻译
改善磁共振(MR)图像数据的分辨率对于计算机辅助诊断和大脑功能分析至关重要。更高的分辨率有助于捕获更详细的内容,但通常会导致较低的信噪比和更长的扫描时间。为此,MR Image超级分辨率已成为近期广泛利益的主题。现有作品建立了广泛的深层模型,该模型具有基于卷积神经网络(CNN)的常规体系结构。在这项工作中,为了进一步推进该研究领域,我们尽早努力建立一个基于变压器的MR图像超分辨率框架,并仔细设计了探索有价值的领域的先验知识。具体而言,我们考虑了包括高频结构的两倍领域先验和模式间环境,并建立了一种新颖的变压器体系结构,称为跨模式高频变压器(COHF-T),以将此类先验引入超分辨率(LR)MR图像的超级分辨。两个数据集的实验表明COHF-T可以实现新的最新性能。
translated by 谷歌翻译
由于成像装置的约束和操作时间的高成本,电脑断层扫描(CT)扫描通常以低帧内分辨率获取。改善切片内分辨率对人类专家和计算机辅助系统的疾病诊断有益。为此,本文建立了一种新型医疗切片合成,以增加切片分辨率。考虑到临床实践中始终缺乏地面真理中间医学切片,我们介绍了以自我监督的学习方式实现这项任务的增量跨视图相互蒸馏策略。具体而言,我们从三种不同的视图模型在这种情况下,从不同视图中学到的模型可以蒸馏有价值的知识来引导彼此的学习过程。我们可以重复此过程以使模型通过增加切片分辨率来综合中间切片数据。为了证明所提出的方法的有效性,我们对大型CT数据集进行了全面的实验。定量和定性比较结果表明,我们的方法通过清晰的边缘来占据最先进的算法。
translated by 谷歌翻译
当前弱监督的语义分割(WSSS)框架通常包含分离的掩模 - 细化模型和主要语义区域挖掘模型。这些方法将包含冗余特征提取骨干网和偏置的学习目标,使其计算复杂但是解决WSSS任务的子最优。为了解决这个问题,本文建立了一个紧凑的学习框架,将分类和掩码精细组件嵌入统一的深层模型。通过共享特征提取骨干通,我们的模型能够促进两个组件之间的知识共享,同时保留低计算复杂性。为了鼓励高质量的知识互动,我们提出了一种新颖的替代自我双重教学(ASDT)机制。与传统蒸馏策略不同,我们模型中的两个教师分支的知识通过脉冲宽度调制(PWM)替代地蒸馏到学生分支,该脉冲宽度调制(PWM)产生PW波形选择信号以引导知识蒸馏过程。通过这种方式,学生分支可以帮助阻止模型落入由教师分支提供的不完美知识引起的局部最低解决方案。 Pascal VOC的综合实验2012和Coco-Stuff 10K展示了拟议的替代自我双重教学机制的有效性以及我们方法的新的最新性能。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译